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Abstract 

Generalized Gibbs kernels are those that may take any direction not nec- 
essarily bounded each axis along the parameters of the objective function. We 
study how to optimally choose such directions in a Directional, random scan, 
Gibbs sampler setting. The optimal direction is chosen by minimizing to the 
mutual information (Kullback-Leibler divergence) of two steps of the MCMC 
for a truncated Normal objective function. The result is generalized to be 
used when a Multivariate Normal (local) approximation is available for the 
objective function. Three Gibbs direction distributions are tested in highly 
skewed non-normal objective functions. 

Keywords: MCMC; Bayesian inference; simulation. 

1 Introduction 

Let 7r{x) be the objective distribution of interest. Let X G be a random variable 
with density 7t{x). The (univariate) Gibbs sampler is a MCMC sampling algorithm 
that simulates systematically or randomly from the conditional distributions 

fx,\X_A^i\X-i) OC 7t{x), (1) 

where the notation 

V-i = {Vi, . . . ,Vi_i,Vi^i, ...,Vn) 

represents the (n — l)-dimension vector created by deleting the i-th entry from 
the n-dimension vector v. ([T]) above represent univariate distributions and are the 
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conditional distributions alone the axis for the base chosen to represent X, the so 



called in. 



1 conditional distributions. 



Using ([T) a Markov chain ... is created considering the transition kernel 

That is, the z-th kernel changes only the z-th coordinate by simulating from the full 
conditional distribution fxi\X-i{'\x-i)- In random scan Gibbs, a complete transition 
kernel is defined as 

n 

1=1 

for some weights Yl^=i '^i = 1; '^i ^ 0. 

The direction Gibbs sampler generalizes this idea by choosing a direction e G M^, 
||e|| = 1, and sampling from the conditional distribution alone such direction. This 
can be written as 

where the length r G M has distribution proportional to ti{x^'^^ + re). In any case, 
it can be seen that the transition kernel has detailed balance with respect to tt, and 
by assuring vr-irreducibility the Markov chain has tt as ergodic distribution. 

The natural question to ask is how to choose e to optimize the convergence 
(mixing) of the Markov chain? Indeed, once irreducibility is assured, any chain 
will have the correct ergodic distribution (tt) but performance will depend on how 
dependent X^^^^^ and X^^^ are. In this context a convenient, although less known, 
dependence measure to use is the mutual information I between random variables 
X and y, which measures the KuUback-Leibler divergence between the joint model 
fx,Y and the independent alternative fxfr^ that is, 

/(r,x)= / [ fYMy^^)^og !Y^y'''\ dxdy. 

J J jY{y)Jx{x) 

In our case, this translates to X = X^^^ and Y = X^^^^\ Assuming that X ^ tt, 
we see that fviv) = ^{v) and /r,x(2/,^) = 7T{x)K{x^y). Therefore, the mutual 
information above can be calculated as 

IiY,X) = J j n{x)K{x,y)\og^^^^dydx. (2) 

The idea would be to choose directions for which I{X^^^^\X^^^) is minimized. 
Since / is a KuUback-Leibler divergence it is well defined, / > and / = if and 
only if X^^^-^^ and X^^^ are independent {fx,Y = fxfr a.s.). Finding directions e 
for which / is minimum will provide our optimization criterion, to obtaining better 
suited direction Gibbs samplers. 
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2 The Normal case 



In many applications, a Multivariate Normal approximation to the posterior is pos- 
sible. We will try to use such Normal approximation to produce an improved Gibbs 
samplers. In this section we will assume tt to be Normal, a case that lends itself 
to calculate I{X^^^^\ X^^''). For the moment, we will consider that the dimension 
is sufficiently low so as to have the eigen decomposition of the precision matrix 
available. 

We therefore assume that tt is a multi- variate Normal with precision {n x n) 
matrix A (the inverse of the variance-covariance matrix) and mean column vector 
/X. As explained above, given a direction eGM^, ||e|| = l, the chain moves from 
X(') = xto 

Y = x('+^) =x + re, 
where the length r G M has distribution g proportional to 7r(a3 + re). That is 

g{r) oc exp |~^('^ + r^yA{v + re)| , 

where v = x — [i. After some algebra we see that r ^ N{-^, e'Ae), where e'Ae 
is the precision (by setting e = e^, the i-th standard bases vector, one obtains 

Yir^ N [iii - a';^{v'v - {xi - iiif), an) , 

which is the full conditional distribution for a Multivariate Normal distribution for 
entry i, thus returning to the usual Gibbs sampler). From this we see that the 
transition kernel corresponding to direction e is 

, , (e'AeY \ e'Aef ,^ , e' AvV\ ^ ^ 
Ke[x,y) = i^;^ I exp <^ — I e {y - x) + 1 l{y = x + e {y - x)e) 

(note that y — x = re and since e'e = 1, r = e'{y — x); therefore y is restricted to 
the line y = x + e'{y — x)e). 

We calculate Ie{X^^^^\ X^^^), the mutual information of the Gibbs sampler given 
direction e, as in [2l Note that 

log ^^^^ = C+\\og e'Ae - \ {Q,{e, x, y) - Q,{y)) , 



where 



77 — 1 1 

C=^log27r--log|A|, 

Qi{e,x,y) = e'Ae(^\y-x) + ^^^ , 
Q2{y) = {y- n)'A{y- n). 
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From this we see that 



since J Qi{e^x^y)Ke{x^y)dy = 1. The integral J Q2{y)Ke{x^y)dy may be calcu- 
lated by transforming back to r since 

J Q2{y)Ke{x, y)dy = J i^^ - vyA{re - v)g^{r)dr. 

After some algebra one sees that 

f ^ ^ ^ \ 1 -1 v^Aee^Av , . 
/ Q2{y)Ke{x, y)dy = 1 — + v'Av. 



Therefore 



/ 



^iy) = C7 + - log e'Ae - - ^^^^ + v'Av. 



We need now to integrate with respect to 7T{dx). We note that J v^Av7r{x)dx — n. 
Moreover, the expected value of a quadratic form is 

E{z'Rz) = tr(i2S) + ^I'R^i, 

where /x and S are the mean vector and the variance-covariance matrix of z. Letting 
R = and since E{v) = we obtain 



E ( v'^^^v] = -^triAee'AA-^) 
\ e'Ae J e'Ae ^ ^ 



Therefore 



= 1. 



/e(X(*+i), XW) = C + n- ^ + ^ log e'Ae 



= Ci + ^ log e'Ae, (3) 



where Ci — C -\- n — \. 
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2.1 Choosing a set of directions 



We need now a distribution h for directions to be chosen to generate an irreducible 
Gibbs sampler. That is, according to ^ the best direction is that that minimizes 
Ci + 1 log e'Ae, but we cannot simply choose the best direction: the resulting Gibbs 
will not be irreducible and clearly we will not be sampling from tt. The chain most 
be TT-irreducible in order that the chain has as ergodic distribution tt. Indeed, if 
directions have distribution h{e)^ and this distribution has the whole sphere as 
support, then the resulting Markov chain K{x^ y) = J K^{x^ y)h{e)de is irreducible. 



Kaufman and Smith (1994) argue that an optimal direction distribution is 

n-l 



h{e) oc sup < / 7t{x + Te)dT 



as far as optimizing the (geometric) rate of convergence of the resulting Gibbs sam- 
pling. However, this only applies for a bounded support X for tt. Little else has 
been said regarding the optimal directions for generalized Gibbs samplers. We can- 
not control in general the term ^^^^^^^^^ for unbounded support. However, this suggest 
choosing a direction 



/i*(e) oc sup < / Ti{x + Te)dT \ . 
For the normal case, it is not difficult to see that 



/ 



-1 , . , 1 



{2ti)~^\A\2 

7t[x + re)dr < -== exp 

Ve^Ae 




From this we take /i*(e) oc (e'Ae) 2. On the other hand, one can minimize 
/e(X(^+^\x(^)) by maximizing exp{-/e(X(^+^\ X^^^)}. Then, choosing 

/i*(e) oc (e'Ae)-^ 

will naturally choose directions with low Ie{X^^^^\ ^^^^), as can be seen from ([3]). 

To sample from /i*(e), it is also easy to see that /i*(e) oc J 7r(/x + re)rfr oc 
[e' Ae)~^ (the maximum is achieved when x = jj.). If we simulate Su from a 
Multivariate Normal with precision matrix A and centered at the origin, ttq, and 
take e = it is clear that e will have density oc J 7ro(re)rfr oc (e'Ae)~2. That 
is, e ^ /i*, and this density has the whole sphere §^ as its support and this results 
in an ergodic chain. 



3 Non-normal objective function 

So far nothing has been achieved (since for sampling from a multivariate normal 
several samples form basically the same MN distribution are needed!). But indeed, 
the aim is to produce an efficient sampler when a (local) Multi- variate Normal 
approximation exists. Note from the previous section that our Optimal Direction 
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Gibbs algorithm only requires knowledge of the precision matrix A, not of the mean 
jjL (to sample the direction from /i*). That is, only an (local) approximation to the 
the log of the objective function will be needed. 



3.1 Truncated Normal objective function 

Sometimes the quantities we want to determine are different from the ones which we 
are able to measure. If the data measured depends on the quantities we want, then 
the data contains some information about those quantities. Inverse problems occurs 
when observed data y depend on unknowns x via a measurement process, and we 
want to recover x from y. In linear inverse problems, the relationship between y 
and X is given by 

Bx = y, 

where S is a linear transformation. The classical inverse problem is to invert the 
function B to obtain unknowns x in terms of data. Suppose that Bj^xn^nxi = Umxi 
is a linear inverse problem where the matrix B is known. Suppose that y\x ^ 
Njri{Bx^ T) with T = diag{\^ . . . , ^), and suppose x has a Truncate Multivariate 

Normal Distribution with known mean vector /x and precision matrix A, where 
> Vi We are interested in find the posterior distribution of x\y. After some 
algebra, we see that 

f{x\y) oc exp \^-\{{^ - M*)'A*(x - 
• l{x^ > 0) 

is a Truncate Multivariate Normal Distribution, with precision matrix 

A* = {A + B\mT)B), 

and mean vector 

/X* = (A + B\mT)B)-\fi'A + y\mT)B). 

We take a Multivariate Normal distribution tt with precision (n x n) matrix A 
and mean vector /x = {^\J\) • • • , \J^^ ^ut truncated the support to > (all 
entries are positive, our objective function is oc ti{x)). A random direction Gibbs 



sampling is considered with direction distribution /i* as in Section |2.1[ We set the 
initial point X^°^ = /x and the burn in is not needed. The full conditionals are the 
same as in Section [2] but bound by the positivity constraint. 

We consider several dimensions n = 2,3,5, 10, 15, 20. In each case, we compare 
with a standard random scan (alone the base axis) Gibbs sampling, a pure random 
direction Gibbs and directions simulated from /i*. Using the QR decomposition of a 
n X n matrix of uniform random entries one obtains a random orthonormal matrix 
P, which will represent the orthonormal base of eigenvalues. The eigenvectors of 

Ai, • • • , A^, represent the precisions on each eigenvalue direction and are set to 
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Xi = a- . The standard deviations in each principal (eigen) direction are set to 

ai = i^, (4) 

a >= 0, and A = P'KP^ where A = diag{Xi). These represent decreasing 
standard deviations and are increasingly contrasting as a increases. More con- 
trasting standard deviations result in further correlated distributions. We consider 
a = 0,5, 10, 20. 

For each combination of n and a we calculate the Integrated Autocorrelation 
Time of the resulting chains. Results are shown in Figure [T] Note that the random 
scan and random direction Gibbs worsen in performance as the a^'s are more con- 
trasting, our Optimal Direction Gibbs (ODG) remains with an lAT/n < 12. Note 
that in this general case, the ODG is a random walk; the resulting lAT's are quite 
close to the theoretical optimum for an optimal scaled random walk, as explained 



in (Roberts and Rosenthal, 2001). 



3.2 More general objective functions 

Suppose then that 7t{x) is the objective distribution of interest with support X C 
M^. Let X G be a random variable with density 7r{x). Suppose also that for each 
X G X we have the Hessian H{x) for — log7r(-) evaluated at x. By completing the 
squares in the second order taylor approximation to — log7r(-), we assume sufficient 
smoothness such that 

- log7T{y) ^ C{x) + ^{y - ii{x)yH{x){y - 

where iJi{x) = x — H~^{x)\/{x) (V(a3) is the gradient of — log7r(-) evaluated at 
x). That is, we have a local MN approximation to tt with mean vector iii{x) and 
precision matrix H{x). 

The proposed algorithm runs as follows. At X^^^ = propose jumping to 

Y = X -\- re (5) 

where the length r G M has distribution r ^ N (^—^^^^^^e' H{x)e^ and v = 
X — iJi{x) = X — X + H~^{x)\/{x). This translates to 

As usual, the proposal Y will be accepted with a Metropolis-Hastings probability. 
In the case that tt is a MN as in Section [2] it is clear that \/{x) = A{x — [i)^ 
H{x) = A, therefore v = A{x — /x) and we are back with the same sampler. 
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Dimension Dimension 



(c) (d) 

Figure 1: Integrated autocorrelation times divided by the dimension of the MN 
objective function. Direction gibbs for random scan (blue), random direction (green) 
and direction according to h* (red). The standard deviations for the MN are set 
according to ^ for (a) a = (independent normals) (b) a = 5, (c) a = 10 and (d) 
a = 20. 



8 



The Metropolis-Hasting ratio would be 



Ti{x) q{y\xy 

where q{-\x) = h^{e)qe{-\x) is the density of Y in In Section [2] ge(-|a3) is the 
conditional distribution over the direction e, given x (gibbs kernel), and cancels 
out with Ti{x) {^^{e) would cancel out with h^{—e) since the Hessian is constant; 
H{x) = A). Let g{x) = log7r(a3), then 

logi? = {g{y)+ log q{x\y)) - {g{x) + log q{y\x)). 

Let 

^x{e,r) = g{x) + logq{x + re\x). 
Note that q{x + re\x) = h^{e)q^{x + re\x) and therefore 



27r 

f e'H(x)e ( elVix) 
• exp <^ r + ^ ^ 



e'H{x)e 



and 

V'a.le, r) = + log/CH(a,) ^ ( r + ^7^^^ 

where 

i^^U = / {e'H{x)e)-Ue 

J\\e\\=l 

is the normalizing constant of /i* that now depends on x. We then have 

R{x, e, r) = exp{?/;a,+^e(-e, r) - ?/;a,(e, r)}. (6) 

Therefore, the probability of accepting a jump from x to y = x -\- re should be 
min{l, i?(a3, e, r)}. However, it is not possible to obtain Kh{x) analytically. For 
dimension n = 2 we calculate Kh{x) numerically, only for comparison purposes. 

An alternative distribution of directions would be the following. Assuming that 
A = H{x) is positive defined, we can use its eigenvalues and eigenvectors. We will 
take the directions e as the eigenvectors of matrix H{x)^ so e G {ei,e2, . . . 
The i-th direction will be selected with probability proportional to A^~^, where 
is the eigenvalue corresponding to the i-th eigenvector , i = 1,2,. ..,n . Then 
hi{ei) = {kXi)~^ ^ where k = X^^^^ A^""^. By doing this the distribution of e becomes 
discrete. Moreover, we avoid the problem we have with de normalization constant 
KH{e) cind the implementation can be done for large values of dimension n. 

It lis easy to see that direction ei, corresponding to the lowest eigenvalue A^ of 
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OL 


S 


lAT 


Acceptance rate 


a) 


(-1.0,-1.0) 


(1.0,0.5,0.5,1.0) 


4.039870 


0.8712 


b) 


(-0.5,5.0) 


(1.0,0.9,0.9,1.0) 


6.944900 


0.7693 


c) 


(-5.0,5.0) 


(1.0,0.9,0.9,1.0) 


3.909896 


0.8485 


d) 


(-10.0,-10.0) 


(1.0,0.5,0.5,1.0) 


7.629253 


0.6892 



Table 1: Acceptance rate and I AT for simulations obtained with the Optimal 
Direction Gibbs algorithm using distribution /i*. 



H{x)^ is optimal indeed. Note that 

= min \c +-\og{e'H{x)e)\ 

||e||=l V / ||e||=l 2 J 

= C +\\og (min {e'H{x)e} \ = C + JlogAi. 

2 y||e||=l J 2 

The minimum is reached when e = ei, the eigenvector associated to Ai. 

3.3 Example: The skew-normal distribution 

To test the performance of our algorithm we will use as objective distribution a vari- 
ant of the skew-normal distribution. We choose this distribution due its simplicity 
and versatility, plus the fact that it allows us to calculate analytically the gradient 
and the Hessian of — logvr. 

The multivariate skew-normal probability density function is given by 

where (/)^(-;S) is the multivariate normal probability density function with mean 
vector and variance-covariance matrix S, $(•) is the standard univariate normal 
cumulative distribution function, ^ is a localization parameter and a is a shape 



parameter (Azzalini and Adelchi, 2005). 



$(•) is called perturbation function and may be the cumulative distribution func- 
tion of any univariate symmetric distribution around 0. In our case, we are going to 
use the Logistic cumulative distribution function with mean and scale parameter 
5 = — , which is 



G{x) 



1 



1 + exp { — TTx/ \/3} 
Then, the density function of our objective distribution is given by 
\A\ ^ 



tt{x) = 2 



exp 



2 J 1 + exp |— 7ra'cc/v3| 
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alpha = (-1.0, -1.0) Sigma = matrix(1.0, 0.5, 0.5, 1.0) 



alpha = (-0.5, 5.0) Sigma = matrix(1.0, 0.9, 0.9, 1.0) 



3 
2 

(N 

X 1 

-1 



-4 -3-2-10 1 2 

XI 



1 2 
XI 



a) 



b) 



alpha = (-5.0, 5.0) Sigma = matrix(1.0, 0.9, 0.9, 1.0) 



alpha = (-10.0, -10.0) Sigma = matrix(1.0, 0.5, 0.5, 1.0) 



3- 
2 
1 


< 

-1 
-2 
-3 

-4- 




-3-2-10 1 2 3 
XI 




c) 

Figure 2: Simulations obtained with the Optimal Direction Gibbs algorithm using 
distribution /i*. 
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where A = T, ^. In order to implement the algorithm we need to calculate the 
gradient and Hessian of — logn^x). It is easy to see that 

\/{x) = Ax-{l- G{cx'x)) a, 

and 

H{x) = A + g{oLx)oLOL', 

where g{-) is the Logistic probability density function. 

To test the algorithm we consider the case n = 2. Figure [2] shows the contour 
plots of the objective distribution for different values of S and ex. It also shows 
the simulations obtained using the algorithm with distribution /i*, after 10,000 
iterations. We set the initial point X^°^ = and the burn in is not needed. Table 
[T] shows the acceptance rate and lAT for each example. Note that the value of the 
I AT is small and the acceptance rate is large in all cases. Both measures indicate 
that the performance of the algorithm is quite good. Is important to mention that a 
conventional Gibbs sampler cannot be used for this example since the full conditional 
distributions do not have any close form. Our scheme has not that kind of difficulties. 

Unfortunately, the implementation is not as simple for dimension n > 3 since we 
need to calculate the normalization constant Kh(x) numerically. 



In Subsection |3.2| we proposed an alternative direction distribution hi to avoid 
calculating Kh{x) • We test the algorithm using this distribution on the same exam- 
ples. Figure |3] shows some very interesting results. Simulations in examples a) and 
b) are well distributed over the region of interest. However, in examples c) and d) 
simulations are concentrated in a specific region. This results from the fact that the 
eigenvalues of H{x) are very contrasting, one is much larger than the other, and 
consequently, simulations are obtained in similar directions constantly. 

This suggests that the scheme works well in simple cases, but it becomes ineffi- 
cient against more skewed distributions. However, we can still modify hi. We take 
the directions e as eigenvectors of H{x)^ but the probability of choosing will now 
be proportional to (A^)~^, where b ^ 5eta(l,9), this is h2{ei) oc (A^)~^. By doing 
this we allow the chance that probabilities become more balanced in regions where 
there is a very dominant eigenvalue. 

Figure [i] shows the behaviour of the resulting algorithm. Examples a) and b) 
show a good performance as before. On the other hand, examples c) and d), which 
we identified as difficult above, seem to have been corrected. Table [2] shows the 
acceptance rate and I AT. Here we see good results. Acceptance rates remain high 
and do not represent a problem. The lAT is slightly higher than in Table [l} except 
by example c); however, the difference is not quite significant. 

Direction distribution /12 seems a good alternative for our optimal direction dis- 
tribution since it shows good performance and the lAT values are very similar to 
those obtained with our original scheme. The parameters of the Beta distribution 
were selected intuitively. In fact, those parameters allows us to change distribution 
/12 in order to get better results. 
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alpha = (-1.0, -1.0) Sigma = matrix(1.0, 0.5, 0.5, 1.0) 



alpha = (-0.5, 5.0) Sigma = matrix(1.0, 0.9, 0.9, 1.0) 






OL 




lAT 


Acceptance rate 


a) 


(-1.0,-1.0) 


(1.0,0.5,0.5,1.0) 


4.497301 


0.8793 


b) 


(-0.5,5.0) 


(1.0,0.9,0.9,1.0) 


7.844630 


0.7534 


c) 


(-5.0,5.0) 


(1.0,0.9,0.9,1.0) 


2.705416 


0.8809 


d) 


(-10.0,-10.0) 


(1.0,0.5,0.5,1.0) 


8.821470 


0.7327 



Table 2: Acceptance rate and I AT for simulations obtained with the Optimal 
Direction Gibbs algorithm using distribution /12. 
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alpha = (-1.0, -1.0) Sigma = matrix(1.0, 0.5, 0.5, 1.0) 



alpha = (-0.5, 5.0) Sigma = matrix(1.0, 0.9, 0.9, 1.0) 
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-1 
-2 
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alpha = (-5.0, 5.0) 


Sigma = matrix(1.0, 0.9, 0.9, 1.0) 
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alpha = (-10.0, -10.0) Sigma = 


matrixd.O, 0.5, 0.5, 1.0) 














1 





















x-1 




^\ 
) 














-2 
-3 




-3 -2 -1 


12 3 
XI 

c) 




-4 -3 -2 

XI 

d) 


-1 1 



Figure 4: Simulations obtained with the Optimal Direction Gibbs algorithm using 
distribution /12. 
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4 Discussion 



Our Optimal Direction Gibbs sampler presents interesting characteristics in exam- 
ples where either a conventional gibbs sampler is impossible to implement or o ther 
MCMC methods (eg. a Random Walk Metropolis-Hastings) would be vey difficult 
to tune. The truncated normal example is of great relevance in the field of in- 
verse problems and we have also worked with strongly skewed distributions with 
contrasting scales, with promising results in all cases. 

5 Acknowledgements 

DAPR and MSG thank CONACyT for a MSc scholarship since part of this work 
was conducted while finishing their MSc studies at CIMAT. Part of JAC work was 
founded by CONACyT grant 128477. 

References 

Azzalini, Adelchi (2005) The skew-normal distribution and related multivariate fam- 
ilies. Scandinavian Journal of Statistics 2:159-188 

Kaufman D, Smith R (1994) Direction choice for accelerated convergence in hit- 
and-run sampling. Operation Research 46(l):84-95 

Roberts GO, Rosenthal JS (2001) Optimal scaling for various metropolis-hastings 
algorithms. Statistical Science 16(4):351-367 



15 



